ChIP-seq phantom peaks or as we called them High-occupancy target (HOT) regions are parts of the genome that have an unusual amount of transcription factor binding sites. These regions show up in various species and are thought to be biologically important because of the high concentration of transcription factor binding. They also overlap with housekeeping gene promoters, and the related genes are consistently expressed across many cell types. Despite these interesting features, HOT regions are mainly defined using ChIP-seq experiments and don’t show the typical motifs for the transcription factors believed to bind there.
Upon observing common low-level sequence features of HOT regions across species, we investigated whether potential technical biases in ChIP-seq could at least partially explain false positive signals on HOT regions. 14 out of 22 publicly available ChIP-seq experiments with knock-out of the genes that encodes target proteins show enrichment even though the chipped protein shouldn’t be present in the analysed sample. Such false positive signal is the highest on HOT regions.
The observed ChIP signal arises from a combination of different signal sources. The signal in a ChIP experiment originates from an antibody binding to the intended target protein (blue), and nonspecific antibody binding—either to the non-target proteins (orange) or directly to polynucleotide structures, such as R-loops (red). The error (orange + red) is not proportional to the signal from the targeted protein, rather, it depends on sequence properties, antibody properties and expression characteristics of individual genomic regions. The combination of different noise profiles result in a subset of ChIP-seq peaks being false positives.
For more details check out our: